Deterministic Sample Sort For GPUs

نویسندگان

  • Frank Dehne
  • Hamidreza Zaboli
چکیده

We demonstrate that parallel deterministic sample sort for many-core GPUs (GPU Bucket Sort) is not only considerably faster than the best comparison-based sorting algorithm for GPUs (Thrust Merge [Satish et.al., Proc. IPDPS 2009]) but also as fast as randomized sample sort for GPUs (GPU Sample Sort [Leischner et.al., Proc. IPDPS 2010]). However, deterministic sample sort has the advantage that bucket sizes are guaranteed and therefore its running time does not have the input data dependent fluctuations that can occur for randomized sample sort.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sorting On A Graphics Processing Unit(GPU)

2.1 Graphics Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.2 Sorting Numbers on GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.2.1 SDK Radix Sort Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2.1.1 Step 1–Sorting tiles ...

متن کامل

Implicit radix sorting on GPUs

In this chapter, we present a high performance sorting function on GPUs that is able to exploit the parallel processing power and memory bandwidth of modern GPUs to sort large quantities of data at a very high speed. We revisit the traditional radix sorting framework, analyze the weaknesses, and then propose a solution based on the implicit counting data presentation and its associated operatio...

متن کامل

Efficient Primitives and Algorithms for Many-core architectures

OF THE DISSERTATION Efficient Primitives and Algorithms for Many-core architectures Graphics Processing Units (GPUs) are a fast evolving architecture. Over the last decade their programmability has been harnessed to solve non-graphics tasks—in many cases at a huge performance advantage to CPUs. Unlike CPUs, GPUs have always been a highly parallel architecture—thousands of lightweight execution ...

متن کامل

Sample Sort on Meshes

Sorting on interconnection networks has been solved ‘optimally’. However, the ‘lowerorder’ terms are so large that they dominate the overall time-consumption for many practical problem sizes. Particularly for deterministic algorithms, this is a serious problem. In this paper a refined deterministic sampling strategy is presented, by which the additional term of the presented deterministic sorti...

متن کامل

Accelerating high-order WENO schemes using two heterogeneous GPUs

A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Parallel Processing Letters

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2012